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Abstract - A new class of architectures for an all-digital modem is presented 
in this report. This architecture, referred to as the parallel receiver (PRX), is 
based on employing multirate digital filter banks (DFBs) to demodulate, 
track, and detect the received symbol stream. The resulting architecture is 
derived, and specifications are outlined for designing the DFB for the PRX. 
The key feature of this approach is a lower processing rate than either the 
Nyquist rate or the symbol rate, without any degradation in the symbol error 
rate. Due to the freedom in choosing the processing rate, the designer is able 
to arbitrarily select and use digital components, independent of the speed of 
the integrated circuit technology. PRX architecture is particularly suited for 
high data rate applications, and due to the modular structure of the parallel 
signal path, expansion to even higher data rates is accommodated with ease. 
Applications of the PRX would include gigabit satellite channels, multiple 
spacecraft, optical links, interactive cable-TV, telemedicine, code division 
multiple access (CDMA) communication, and others. 
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It Motivation 


With the evolution of high speed satellite and terrestrial communications, the 
applications of high data rate communication systems are becoming abundant. 
Existing earth orbital missions such as the Telecommunication and Data Relay 
Satellite System (TDRSS) support data rates of up to 300 Mbps. Comm- 
unication systems must today process faster and handle an ever rising data 
throughput. 

Advances in digital integrated circuit (IC) technology have made switching 
speeds close to 1GHz possible. However, the widespread use of high speed 
components is costly both in price and power consumption. One of the key 
bottlenecks in DSP design for an all-digital receiver is the availability of 
components (e.g. multiply-accumulator) that process each sample at the input 
sampling rate, when the latter exceeds 200 MHz or so. The objective here is 
to explore a cost effective solution to this problem. The ideal solution is to 
employ lower speed (50-70 MHz) components using IC technologies such as 
the Complementary Metal Oxide Semiconductor (CMOS) technology. CMOS 
has many known advantages such as low cost, low power, and high density. 
The data acquisition technology also has undergone rapid advancements, 
where today, one giga-sample per second analog-to-digital (A/D) converters 
are emerging. By using a single high speed A/D component and a low 
number of high speed components (e.g. multiplexers only), a fundamental 
question is posed: 

Is it possible to architect a digital receiver such that the processing rate is 
slower than both the sampling and the symbol rate ? 

The answer to the above question is "yes". In this work, we devise a new 
approach for designing a digital receiver that trades off processing rate with 
parallelism. Our presentation is largely based on the evolving disciplines of 
multirate signal processing and digital filter bank (DFB) theory. Classically, 
the filter banks have been used for subband coding applications. Using the 
filter bank theory for designing the digital receiver, the resulting system is an 
all-purpose receiver which is suited for a variety of different modulation 
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formats for high data rate channels. Our general approach is also suited for 
other multi-channel communication applications such as multi-carrier 
modulation systems, multiple spacecraft communication (or users), and 
spread spectrum communi cation systems. 

Another important by-product of our approach is that the overall architecture 
of the receiver is modular in the sense that if a higher data rate is desired, 
the same hardware (at IC or board level) may be replicated and deployed 
without the need to redesign the whole system. In our previous work [1], we 
succeeded in formulating a parallel digital phase locked loop (PDPLL). This 
work formed the basis of the results presented here, and it was expanded to 
provide a cohesive approach to designing a digital receiver. 
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TT. Introduction 


With the rapid growth of the VLSI technology coupled with the flexibilities of 
digital signal processing techniques, a key objective in the design of the 
receiver is to implement the system digitally. However, it is generally not 
feasible (even though more desirable) to sample the received signal directly at 
the radio frequency (RF). Thus, we assume the availability of an 
intermediate stage for open loop down-conversion of the RF signal to a 
convenient frequency for the A/D conversion, referred to as the intermediate 

frequency (IF). 

An all-digital receiver is understood here as a receiver that performs, 
demodulation, matched filtering, carrier synchronization, and symbol timing 
recovery. All other required functions (e.g. lock indicators, power estimators, 
etc.) in a conventional receiver use either the in-phase and quadrature 
components, or the output of the matched filters derived from these 
components. The in-phase and quadrature components of the receiver form 
the sufficient statistics for other estimators and lock detectors. Thus, we do 
not outline the implementation of these functions. The merits of an all- 
digital approach versus the analog implementation are widely known. A new 
generation of all-digital systems has been successfully deployed in NASA's 
Deep Space Network [2]. These receivers support a symbol rate of up to a few 
mega-symbols per second. Here, we propose a new approach for the design of 
the future generation of receivers, that could be potentially used for high data 
rate applications up to giga-symbols per second. 

A typical digital receiver is shown in Fig. 1. The input signal x(t) is sampled 

and converted to a discrete time sequence by an analog-to-digital (A/D) 
converter with a uniform sampling period of T s seconds. The output of the 

matched filter in this block diagram is the estimated symbol sequence 
(complex or real). In a coded communication system, this sequence is used by 
the channel decoder (e g., Viterbi decoder) to estimate the transmitted bit 
stream. 
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The symbol duration hereon is denoted as T seconds, the sampling period is 
denoted by T s seconds, and the single-sided input bandwidth of the 

equivalent lowpass prefilter prior to A/D conversion in the receiver is denoted 
by W in Hz as shown in Fig. 2. We assume a soft decision symbol output for 
generality. 

Some fundamental choices we made in the architecture of the digital receiver 
depicted in Fig. 1 are: 

1. The receiver employs bandpass sampling for A/D conversion of the 
received signal. It is noted that there are other alternative versions of this 
I&Q receiver structure. A widely used alternative (baseband sampling) is 
used to perform the demodulation in the analog domain, prior to A/D 
conversion, and then sample the I&Q components separately in the 
baseband, using two separate A/Ds. In bandpass sampling, a single A/D 
converter is used. In reference [3], these alternative approaches are 
investigated, and it is concluded that bandpass sampling is more suited 
for space communication. In bandpass sampling (see Fig. 2), the 
minimum sampling rate is f s = 4 Wand the center frequency of the anti- 
aliasing filter prior to the A/D conversion is positioned at f e lF = (2k + l)W 
for some integer k. Usually the integer k is chosen to result in a 
convenient center frequency for designing the anti-aliasing filter. Due to 
practical limitations in filter design [e.g. surface acoustic wave (SAW)], 
off-the shelf filters are only available in a finite range of center 
frequencies. 

2. The phase tracking loop is closed in the digital domain. The merits for 
this loop closure versus the digital-to-analog conversion and closing the 
loop in the IF section are discussed in [3]. 

It is noted that our approach can also be applied with minor modification 
when baseband sampling is employed for A/D conversion, or the phase 
tracking loop is closed in the analog domain. The processing rate in our 
scheme is not limited by the minimum sampling rate, as exhibited later in 
this report. In the conventional approach, the processing rate (shown as a 
one sided arrow in the bottom left hand side of Fig. 1) in the digital signal 
processing building blocks following the A/D converter is at the minimum 
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processing rate 1/(27,). We seek to parallelize the structure shown in Fig. 1 
such that the processing rate throughout the receiver can be arbitrarily 
selected by the designer. This selection is only limited by the amount of 
resources (hardware) available for fabricating the receiver, and it is not 
dictated by the input sampling rate or the speed of the ICs used in the signal 

path. 


The architecture of PRX derived in this paper is of the form depicted in Fig. 3. 
Here M denotes the decimation rate in each subband. In this architecture the 
input signal is parallelized into 2 M separate signal paths. The input signal is 
filtered using a Discrete Fourier Transform (DFT) based analysis and 
synthesis filter bank, augmented with parallel equivalent of the matched 
filtering operation. The resulting output of the overall system is the detected 
output symbol sequence. The key feature of this implementation is the 
parallelization of the input signal and processing of the input samples at the 
rate of l/(MT s ), illustrated in Fig. 3 as a single sided arrow. In the foregoing, 

we outline the derivation and the design of the filter banks for transforming 
the structure shown in Fig. 1 into the final form depicted in Fig. 3. 


II- 1 Outline 

Section II. 2 begins with an introductory section on detection of signals in 
additive white Gaussian noise (AWGN) channel. Section II.3 contains some 
relevant results from multirate signal processing that are used in the sequel. 
Sections IL2 and II.3 may be skipped by those readers familiar with 
these subjects. Section III describes the design and architecture for the 
parallel implementation of the demodulation and filtering using multirate 
filter banks. In Section IV, the derivation of and our approach to combined 
digital matched filtering and demodulation are discussed. Section IV.2 
contains a design approach for digital matched filtering using interpolated 
finite impulse response filtering; this section is independent of the other 
section in Section IV and may be used independently of other results in this 
report. Sections V and VI contain the design for the symbol timing recovery 
and carrier tracking in the PRX, respectively. Section VII discusses 
alternative architectures. Section VIII contains a brief discussion of the 
processing delay of the PRX. In Section IX, simulation results for a 16- 
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channel PRX are provided. Section X outlines some future direction for 
research in this area. Section XI includes some concluding remarks. The 
four appendices, namely A,B,C, and D contain respectively the MathCad™ 
software for generation of the filter banks, interpolated finite impulse 
response (IFIR) filter design for matched filtering, the receiver block diagram, 
and the C-source code programs for generating the polyphase components of 
the filter banks used in simulation of the PRX. 


II.2 Detection of Signal in AWGN 

The received waveform x(t) = s(t ) 4- n(t) is composed of signal sit) plus noise 
n(t), specifically 


x(t) = ^a k p(t - kT) cos(2nf c t + 6) + n(t) (1) 
k 

where a k e U is the symbol sequence and for binary phase shift keying 
(BPSK) U = {-1,+1} . We assume full response signaling where the pulse 
shape pit ) satisfies p{t) = 0 for 1 g [0, T) . Here nit) is additive white Gaussian 
noise (AWGN) with a single sided spectral density N a . The optimum coherent 
receiver for this received signal is well known [5], We outline here some of 
the results used in the sequel that follows. 

The optimum receiver for detection of signals, with a known waveform, in 
AWGN is based on maximizing the cross correlation of the known waveform and 
the received waveform. Formally, it can be shown that maximizing the a- 
posterior probability density of the received signal Pr(a t = a m \r(t),t e [kT,(k 4- 1)T)) 

for equally likely and independent symbols results in maximizing the metric 
pia m ) during the k-th signaling interval [5], where in general m = 1, • • - , I U I (the 
notation"| |" used on a set denotes the cardinality of the set). Formally, the 
metric p(a m ) is 


M« m ) = Re k J ° < s m >1 (2) 

where Re[.] stands for taking the real part, and the inner product is defined 
by 
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(*+ 1 ) 7 - 

< x, s m >= j x(t) s m (0 dt (3) 

kT 

Both x(t) and s m {t) are represented in complex baseband. Each possible 
transmitted signal waveform s m (t) is 

(0 = a mP^ ~ kT ) for eaCh a m € U * 

for t € [kT,(k + 1 )T), and for BPSK m = 1,2... The optimum detector formally 
computes (3) during the A-th signaling interval for each value of oc m , and picks 
the maximum, i.e. the detected symbol is a k = rnax 1 fi(cx). The matched filtering 

as defined in (3) for computing p{a m ) is depicted in Fig. 4. For BPSK signals 
with € U, the correlation is performed only once for each symbol, since s m (t ) 

differs only in sign during each signaling interval. In the remaining part of this 
report, we us e/J.(a k ) to denote the matched filter output during the &-th 

signaling period. 

The carrier phase d(t) - 0 in the received signal is a slowly varying random 
process and is estimated using a Costas loop [5]. This estimate is used in the 
voltage controlled oscillator (VCO) for generating the reference signal, 

e j(2nf C t+6 \ as depicted in Fig. 5. The Costas loop inputs are the in-phase and 
quadrature components of the received signal, which respectively are 

y l m (t)ARc[a k p(t - kT)e j6 + n(t )] yg(t ) A - Imta^ - kT)e jd + n(t)], (5) 

for t e [kT,(k + 1)7). The Costas loop for tracking the phase of the BPSK 
signal is shown in Fig. 5a. Throughout this document, we assume that all 
signal paths are complex. The equivalency of the complex version and the 
classical model is evident from Fig. 5b. 

In the digital counterpart of the above formulation for the optimum receiver 
[6,7], when the time bandwidth product of the system is large (i.e WT»1 is 
equivalent to high sampling rate), the inner product in (3) can be approxi- 
mated by 

<x,s m >= (6) 

ner 
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where T = {n :kT < nT s < (k + 1)7}, x n = x{nT s ) and w* = p(nT s - kT). The 

dimension of this summation, N- 1 T I is the number of samples per symbol 
and is bounded by = \T I T s ~\. For simplicity, assume that the symbol 

period is an integer multiple of the sampling period so N = N mix = T/T S . In a 
digital system, the input bandwidth W and the sampling frequency f s must 
be chosen such that the number of samples per symbol N >2WT. In the 
baseband model, the received signal is sampled and processed as shown in 
Fig. 6. An anti-aliasing filter, with bandwidth W-Hertz, is used for 
prefiltering the signal. Subsequently, an A/D converter converts the signal 
into a discrete time sequence x„ = x(nT s ). Due to prefiltering and A/D 

conversion errors, particularly when the time bandwidth product of the 
system is small, the sampled signal waveform which must be used in 
equation (6) is different from the one derived from the ideal pulse shape pit). 
When the pulse shape p(t ) is bandlimited by the prefilter, the filtering 
operation manifests itself as amplitude distortion. In particular, this 
distortion is significant when using a rectangular pulse shape for non-return- 
to-zero (NRZ) or bi-phase (also referred to as Manchester) signals. In 
general, the effect of prefiltering, amplitude, or phase distortion can be 
compensated for by the discrete time sampled version of the matched filter as 
discussed in [7]. 

Bandpass sampling is employed in our approach as motivated earlier in this 
section. In Fig. 7, the input signal is prefiltered by the anti-aliasing filter 
with the analog frequency response H a (s). The filtered signal is converted by 
a single A/D converter to a discrete time sequence x„ = x(nT s ). Demodulation 

is performed by the multiplication of the reference carrier signal and the 
input signal, which is then filtered by the lowpass filter H d {z) to reject the 
double frequency components [4]. In the remaining part of this report we 
drop the superscript d and simply refer to this filter as H(z). The output of 
this filter is used by the matched filter for detecting the transmitted symbols. 
Hence, the structure of the optimum detector is performed in two separate 
stages, namely the demodulation and matched filtering stages. The purpose 
of the decimator at the output of the demodulation filter and more detailed 
discussion of the structure in Fig. 7 can be found in Sections III and IV. 
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Our results to parallelize the signal path and utilize an efficient receiver 
architecture can be expanded to various I&Q modulation formats, including 
higher dimensional modulation schemes for bandwidth efficient modulations 
such as multiple phase shift keying (MPSK), continuous phase modulation 
(CPM), and partial response signaling techniques. These other schemes could 
all be cast into a framework to use our methodology for parallelizing the 
receiver. 


fT a Mnltirfltft Signal P rocessing and PFB Prelim i narie s 

In this section, a brief overview of the results used in this work from multirate 
signal processing is presented. Our notations and approach to multirate 
systems and filter bank theory follow [8]. 


Desimaiiim Expansion 

Decimation and expansion are basic operations in multirate digital systems, 
as shown in Fig. 8. The output of the decimator x d (n), and the output of the 
expander x e (n ) in frequency and time domain respectively are [8]: 


x e (n) = 


x d (n) = x(Mn) 


_ J jc(« / L) if n is multiple of L ^ 
~ 1 0 otherwise 


M _ 


(7) 


where W M = e "^ 1 . When expanding a signal, the original sequence is padded 
with L-l zeros in the time domain between each sample of the original 
sequence, which is equivalent to compressing the original spectrum by a 
factor of M. This is immediately evident when z is replaced by e J , as 
illustrated in Fig. 9. The process of decimation or discarding of M - 1 samples 
in the time domain is equivalent to stretching the original spectrum X(e ja> ) 
by an amount M; creating M - 1 copies of this stretched version, shifting it 
uniformly by multiples of 2 n, and then adding the stretched and shifted 
versions (divided by 1/M). 
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In this paper, all decimation and expansion rates are positive integers, and 
the notation A(z) iM denotes the z-transform of the decimated sequence a(nM) 

and A(z) Tm denotes the z-transform of the expanded sequence a^— 

Blocking 

Blocking of a discrete time sequence is the key to parallelizing digital 
filtering operation. The idea behind blocking of a discrete time sequence is 
illustrated in Fig. 10. 

This is referred to as the "commutator'' model. The commutator is simply a 
switch (or a multiplexer) that rotates at a uniform rate and takes M positions 
periodically. That is, each subsequence ^(n)can be written in terms of x(n)as 
follows 


x t (n) = x(nM + M-i), (8) 

for / = - The blocking approach for the commutator model is shown for 

both the decimator and the expander model. The equivalency of the 
switching model and the delay chain operation is evident for clockwise and 
counterclockwise operation in each case. The "blocked version" is denoted by 
the vector 



~x(nM + M -1)' 


’ '*i(»r 

Xfl(«) = 

x (n M + M - 2) 

- 

x 2 (n) 


x(nM ) 


**(«)_ 


Following the same notation as in [8], the z-transform of x 8 (n) is defined as 

X B (z)= ^x B (n)z~ n . (10) 


A useful result for our application in the demodulation stage is to 
interchange the operation of multiplication with the decimation operation 
as shown in Fig. 11. This commutative property is a direct consequence of 
blocking operation shown in Fig. 10. 


Jet Propulsion Laboratory 


PRX Report 


August 15, 1994 



dotation and Abbreviatio n for Multi-Input and. Output System s. 
Let H(z) denote the transfer function of an arbitrary digital filter, i.e. 

H(z)= X )» (11) 


where {-,ft(-2),A(-l),ft(0),ft(l),/i(2),-} is the impulse response of the filter. 

The input and output of this filter are related by Y(z) = H(z)X(z), where X(z) 
and F(z) are the z-transforms of the input and the output respectively. 

Boldface symbols represent matrices and vectors. The notations A , A , and 
A + represent respectively, transpose, conjugate, and transpose conjugate of 
A . An M-input-M-output system with the transfer function matrix [8] 

H(z) = X,h(«)z " can be defined such that input and output of the system are 

related by Y a (z)=H(z)X s (z). We use the notation 


H(z) = H + (l/z*) = X„ h+ (- n ) z '"- (12) 


which stands for transposition " T ", followed by conjugation "*" of coefficients, 
followed by replacement of z with (z ) 1 . 

Pofvpfeose Components 

Two important results from multi-rate signal processing are the Noble 
identity and the polyphase decomposition. It is possible to represent H(z) as 

defined in (11) in terms of its M- component polyphase form 

//(z) = Xz _; £/(z w ). (13) 

/=0 

Here E,(z) is called the Z-th polyphase component of H{z). The sequence ei(n), 
the inverse z- transform of E,(z), is defined as follows 

e t ( n ) = h(nM + /) = h t ( n ) , ( 14) 
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with 0< l < M-l. The expansion in (13) is simply the decomposition of {/i(n)} 
into M sub-sequences e/(n). For example, by grouping the impulse response 
coefficients h(n) into even- and odd-numbered samples, i.e., eo(n) = h(2n) and 
ei(n)=h(2n+l), the transfer function H{z) may be represented as 

H(z) = E 0 (z 2 ) + z- l E l (z 2 ), (15) 

where 

oo 

Eq(z) = y £h(2n) z - n , (16) 

n =- » 

and 

oo 

E { (z) = 5>(2n + !)*-". (17) 


An important consequence of this representation is that when the polyphase 
component is followed by a decimation operation, then the filtering operation 
and the decimation can be commuted. This property, known as the Noble 
identity, is depicted in Fig. 12. 

Applying the Noble identity to the polyphase representation of (13), the filter 
H(z) followed by a decimator can be re-drawn, as shown in Fig. 13.b. In the 
model shown in Fig. 13. b, the processing rate in each polyphase component is 
a factor of M slower than the sampling clock. The polyphase representation 
results in an efficient rearrangement of the computations of the filtering 
operation. This effectively distributes the computations into a set of parallel 
filters operating at a lower speed. This in turn, reduces the speed constraints 
on the digital signal processing hardware, thereby enabling it to process 
samples at a much lower rate than the sampling rate. 

The polyphase identity is depicted in Fig. 13. c. This identity is used when the 
filter H(z) is preceded by an expander and then followed by a decimator, such 
that the expansion and decimation rates are equal. The cascaded system is a 
linear time-invariant (LTI) system. For verifying the LTI property note that 
the input to the decimator has the z-transform X(z M )H{z) and the output 
signal has the z-transform [X(z M )H(z)] ^ = X(z)[//(z)| iw ] = X(z)E 0 (z), where 

E 0 (z) is the 0-th polyphase component of //(z) . 
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UigitaL F ilter Bank 

A digital filter bank [8] is a collection of digital filters, with a common input 
or a common output. Classically, applications of filter banks have mainly 
concentrated in the areas of signal compression. In these applications, the 
output of the system is a reconstructed signal x(n) from the subband signals. 
The Af decimated subband signals x k (n) are derived from the decimated 
input signal x(n ) . Filter banks may typically have overlapping or non- 
overlapping bandwidths, depending on the desired characteristics. The 
system in Fig. 14 is called a maximally decimated [8] analysis / synthesis 
filter bank, and the set of filters {H k (zl k = 0, -,A/-l} are the analysis filters 

and the set of filters {F k (z), k = 0,- • -,M - 1} are the synthesis filters. 


The output z-transform of the filter bank can be written as a function of the z 
transform of the input as follows: 


X(z) = -j-^XizW^H^zW^F^z), 
M /=0 t=o 


(18) 


where W M = e~ j2 * IM . This can be written more compactly as 


X(z) = 2x(zW') A t (z), where A,(z) = (19) 

/= o * =0 

It is clear that if A t (z) = 0, V / > 0, then aliasing is canceled and X(z) = T(z)X(z), 
where T(z) is referred to as the distortion function and it is given by 

nz)A-±-Z/i t (z)F t (z). (20) 

M k = 0 

For a perfect reconstruction filter bank T(z) = cz • In this case the 
distortion function is simply a delay and a gain, and the system is free from 
aliasing, amplitude distortion and phase non-linearity (or distortion). 


Jet Propulsion Laboratory 


PRX Report 


August 15, 1994 


The filter bank as depicted in Fig. 14, is called a maximally decimated filter 
bank when the number of subband filters is equal to the decimation and 
expansion rate M. The input and output of this system are related by the 
polyphase decomposition of each subband filter. Let 



' H 0 (z) ' 


' £oo(z w ) Eoi (z M ) • 

Eq,m- i(z M ) 

h 00 = 


, E (z M ) = 

* * 






i 

*s> 

T 


, ( 21 ) 


where E (j {z) denotes the j-th polyphase component of the i-th analysis filter 
H^z). Accordingly, for the synthesis filter bank, define 



' Fo(z) ' 


*Oo(Z ) Eoi(z M ) ■ 

r 

/-“N 

N? 

* 

8? 

f(z) = 

_F w .,(z)_ 

, R (z M ) = 

_^M-\.o(z M ) 

1 

7 

S 

T 

8? 


and let the delay chain be denoted by e(z) where 


e(z) = 


-M+\ 


Then, we can write 


Hz) = E (z M )e(z) 

f r (z) = e r (z~ 1 )R(z M ) 


(23) 


(24) 


The equivalent polyphase representation of the filter bank is illustrated in 
Fig. 15. 


In general, to design filter banks free of any distortion, perfect reconstruction 
property is desired. The necessary and sufficient condition [8] for perfect 
reconstruction is that R(z) and E(z) must be of the form 
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R(z)E(z) = cz 


(25) 


for some r such that 0 < r < (M - 1). It is noted here when the polyphase 
matrix E(z) is paraunitary, it satisfies 

E(z)E(z) = I, (26) 

and if R(z) = E(z), then (25) is automatically satisfied. 

UEL Elite!! Bank 

The Discrete Fourier Transform filter bank is a special class of filter bank. 
This class of filter banks can be viewed as a set of equally spaced bandpass 
filters obtained by modulating a prototype lowpass filter. The MxM DFT 
matrix W has elements [W]^ = W*" 1 where W = e^ /M . A DFT filter bank is 
defined such that every filter H k (z) is related to a single prototype H 0 (z) via 
H k (z) = H 0 (zW k ). Around the unit circle, the k-th subband filter is the shifted 
version H {e i((0 ~ (lnkl M)) ) of H(e ja> ) . We can express the set of the M impulse 
responses h k (n ) as 


h k (n) = h <) (n)W kn . (27) 


The simplest DFT filter bank is simply an analysis filter bank with 


where 


H k (z) = H 0 (zW k ) , (28) 

tfo(z) = 1 + z - 1+ ’ + z ’ W+ \ (29) 


as shown in Fig. 16. 

For an arbitrary prototype filter with a polyphase decomposition as m (13), 
the polyphase decomposition of the DFT filter bank is 

H,(z) = = %(z'W-‘)' E,(z M ) = S(z"E,b“)) W". (30) 

to '=0 


Jet Propulsion Laboratory 


PRX Report 


August 15, 1994 


Equation (30) is realized by using the structure shown in Fig. 17. It should 
be noted that the DFT here is performed on each block, where each block is a 
single input vector as in (8), corresponding to each time index of the subband 
signals. By taking advantage of radix-2 choice for AT, it is possible to use the 
fast Fourier transform (FFT) to perform the matrix multiplication shown in 
Fig. 17. Another important advantage, evident from (30), is that only one 
prototype filter is designed and the subband filters are simply the modulated 
versions of the same prototype. 

Blocked Digital Filter 


Another structure to realize a digital filter is the Blocked Digital Filter. 
Consider the scheme of Fig. 18, where H(z) is an MxM transfer function 
matrix [8], and Y a (z) = H(z)X a (z). It can be shown that this system is a linear 
time invariant system, and can be described by a scalar transfer function H(z) 
iff H(z) is a pseudo circulant matrix. That is 


H(z) = 


Fq(z) 

E\{z) 

e m-i(z) 

z~ l E\(z) 

E 0 (z) ■ 

" E m _2 (z) 

~ l E M _ { {z) 

z x E m _ 2 {z) 

£o(z) 


(31) 


The pseudo circulant property means that H(z) is constructed such that every 
row of the matrix is obtained from a circular shift of the previous row, and 
the elements below the main diagonal are multiplied by z ' . Notice that Fig. 
18 also represents a general maximally decimated filter bank (compare with 
Fig. 15). In filter bank language, we can therefore say that the system is 
alias free if and only if R(z)E(z) is pseudo-circulant. In this case the input- 
output relation is K(z) = H{z) X(z) where 


H(z)=z 


-(M- 1 ) 


(M - 1 

V <=0 


(32) 


In block filtering, we can refer to the block filter of Fig. 18 as an imple- 
mentation of the scalar filter H(z ) . 
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The block digital filtering formalism provides a possible realization of a filter 
for moving the filtering operations to a lower rate. This structure can also be 
used when there is no rate conversion preceding or following the filtering 
operation. However, it is difficult to fabricate hardware for a set of digital 
filters to perform block digital filtering in the matrix form. Block digital 
filtering is an expensive operator due to matrix filtering operations, partic- 
ularly when the number of subbands M is large. The complexity of block 
digital filtering applied to our problem is assessed in Section VII. 1. 


Subband Convolutio n Theorem 

The analog of the convolution theorem for using filter banks is outlined in [9] 
and the idea is illustrated in Fig. 19. The subband convolution theorem for filter 
banks permits the convolution of two signals using the subband signals. In 
using this approach for computing the convolution, the decimated version of the 
convolution results is obtained as shown in Fig. 19. Given an analysis/synthesis 
filter bank [H k ,F k ] for 0<k<M-l that forms a perfect reconstruction (e.g., bi- 
orthonormal) system, the theorem states that using the set of filter banks 
{H k ,F k }, the M-fold decimated version of convolution x(n)*y(n) can be computed 
by computing the convolutions x k (n)* y k (n) for all k = — l, and then 

s ummin g the results. That is 


M - 1 

(x(n)*y(n)) lM = ^(x k (n) * y k (n)) iM 


k = 0 


(33) 


The interested reader may refer to [9] for detailed discussion of properties of 
the subband convolution theorem. This concludes our preliminaries and now 
we can begin to consider the problem of parallelization of our receiver. 
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III. Parallel Architecture for D emodulator 


Digital filter bank theory offers a number of different methods for realizing a 
digital filter with a transfer function H(z), at a lower processing rate than the 
input sampling rate. Here, we begin by describing the underlying problem, 
and then a class of digital filter banks are derived that are well suited for our 
application. 

The demodulation and filtering operations are performed as shown in Fig. 20. 
The heterodyning receiver uses a mixer, shown here as a multiplier, to 
translate the signal to baseband, and the filter H(z) is used to filter the double 
frequency images produced by the mixing operation. In this figure, f c is the 

estimated carrier frequency. In the PRX, the mixing operation is performed in 
the subbands, as illustrated in Fig. 11. 

It is possible to decimate the output of the filter, since the bandwidth of the 
filter is always less than the total bandwidth of the input signal x(n). When 
bandpass sampling is used and the minimum rate (f s = 4W) is used to sample 
the input signal, it is shown in [4] that J=2 can be used, in conjunction with 
half band filter for implementing H{ z). Here we present the parallel 
implementation of the demodulator of Fig. 20. This implementation must 
satisfy the properties listed in Table III. 1. 
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Filter Design Requirements 

(1) . Phase linearity of the filter is essential for tracking 
the phase, and extracting information such as the Doppler 
effects. 

(2) . Minimal distortion of the signal while maximally 
rejecting the double frequency images; with no additional 
loss in the new structure when compared to the 
traditional filtering approach. 

(3) . Facilitate mixing and filtering operation at the lowest 
possible rate; i.e., all arithmetic operations (additions and 
multiplications) are performed at the lowest possible rate. 

(4) . Another desirable (but not necessary) property of the 
parallel demodulator is to provide a discrete time 
sequence corresponding to each subband. The signal 
bandwidth is divided to evenly spaced subbands. This 
requirement translates into a set of analysis filter banks 
which are essentially a set of bandpass filters tuned to 
equally spaced center frequencies and which have equal 
bandwidth. 


Table III.l. Filter Design Requirements 

The fourth requirement broadens the scope of application of the demodulator 
particularly for multi-carrier modulation systems, or existing deep space 
mission modulation format in which a subcarrier is present. By having 
access to the subband signals, with linear phase property, a carrier signal can 
be directly accessed from the subband, and fed to a DPLL for tracking 
purposes. Returning to the parallelization of the demodulator, in Table III. 2 
the various approaches, their merits, and shortcomings are summarized. 
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Possible Approaches 


(1) . Blocked digital filtering method described in Section Il.b, 
can be applied for parallelizing the computation, which results in 
a matrix filtering operation using the pseudo-circulant matrix in 
equation (31). This method entails a matrix filtering operation. 
However, the fourth design requirement in Table III. 1 cannot be 
fulfilled here at all. 

(2) . Digital filter bank approach has been classically employed 
in applications of subband coding [8]. In these applications of 
filter banks, an allpass function is the desired frequency response 
of the system. In our case of interest, a lowpass function is the 
desired frequency response of the system. Here the fourth 
property is readily fulfilled as an added feature at no extra cost to 
the overall system. 

(3) . Use the sub-band convolution theorem which is another 
digital filter bank approach. The application of this filter bank is 
computationally even more complex than the second approach. 


Table III.2. Possible Approaches to Parallel Realization of Demodulator 


Based on the arguments listed in Table I II. 2, we are led to consider the digital 
filter bank solution based on the second approach. Typically, there are three 
sources of distortion in filter banks, these are: aliasing, amplitude distortion, 
and phase distortion. In a maximally decimated filter bank, the input signal is 
split into M subband signals xfc(n) by M analysis filters Hk( z) as shown in Fig. 
14. In the case of maximally decimated filter banks, it can be shown that there 
exists a class of perfect reconstruction filters, referred to as M-channel 
Quadrature Mirror Filters (QMF) [8] which eliminate all three distortions for 
full band reconstruction. However, all our four design criteria cannot be 
simultaneously met by the QMF filter bank and maintain perfect 
reconstruction. 


We refer to our filtering problem here as a " partial band reconstruction ' as 
opposed to perfect reconstruction in which case the objective is to reconstruct 
the signal around the whole unit circle. This may be accomplished by 
considering the filter banks in Fig. 14, covering the full band [0,2n), and simply 
keeping a subset of the synthesis filters and discarding the rest as shown in 
Fig. 21. The remaining subset constitutes the passband of the overall system, 
and the discarded set constitutes the stopband. 
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Let Q. = [ 0 , 27 r) denote the frequency domain, andE = {H 0 (z),H,(z), ",H w _,(z)} 

denote the set of M real-valued analysis filters, and let the non-overlapping 

frequency interval I k = [—*,—(* + 1)1 denote the frequency support of the k- 

L M M ) 

th filter with the center frequency f k = — (k + 1) - — — , for k >0 where 

M 2M 

Af 

a=[jl k . By discarding the output of the analysis filters k = j,--,M- 1 , this 

k=0 

operation is equivalent to a lowpass filter, i.e., weighting the frequency 

M - 1 

response over the interval \Jl k with zero. The filter bank shown earlier m 

k=j 

Fig. 14 is re-drawn in Fig. 22 to demonstrate the idea of dropping a subset of 
inputs to the synthesis bank for realizing a low pass frequency response. 

In applying maximally decimated filtering, aliasing effects must be 
considered. In a perfect reconstruction filter bank, this aliasing is effectively 
canceled. Aliasing error due to dropping a subband is illustrated in Fig. 23. 

In Fig. 23 a subset of synthesis filter banks F,(z) for i > k + 2 and i < k are 

discarded. The aliasing error in the signal is not canceled in the frequency 
bands where the adjacent synthesis filters are discarded. This effect is 
exhibited in Fig. 23 in the frequency intervals J k and I k+2 - 

In order to deal with the aliasing we can use an oversampled filter bank, i.e., 
use more filters without increasing the decimation rate. This choice 
translates into decreasing the bandwidth of the analysis filters compared to 
the maximally decimated case. This class of filter banks is referred to as non- 
maximally decimated filter banks [8]. We begin by assigning a frequency 

support (passband) to each filter I \ = jjk - + for com P lex 

7t 

subbands with Q = [0,2 it), with center frequency ff w ^ ere 

Jt = 0, • - ,2M - 1 . This frequency allocation (or filter stacking) doubles the 
number of analysis and synthesis filters. However, since the decimation rate 
is kept at M, the separation between the center frequency of the images of 
each analysis filter relative to its passband is doubled. The idea of using the 
non-maximally decimated filter bank is demonstrated in Fig. 24 for the case 
when M= 3. An example for obtaining a lowpass filter by dropping subbands 
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in a non-maximally decimated filter bank for M=3, and 2Af=6 is shown in Fig. 
24. The DFT filter bank provides an efficient realization for the non- 
maximally decimated filter banks. In the DFT filter bank the subbands are 
complex valued signals. 

In a non-maximally decimated filter bank aliasing is not canceled but it is 
suppressed for all practical purposes. For applications of detection of signal 
in noise, a wise choice for the amount of this suppression is to assure that the 
aliasing level is far below the thermal noise level. The input and output of 
the filter bank (with a single channel shown in Fig. 25b) are related by 

i M - 1 2M-1 

nz) = — J,X(zW‘) X H k (zW l )F k (z ). (34) 

M 1=0 k= 0 


Note that the decimation ratio in this case is AT and is half the number of 
subband channels 2 M. The distortion function for a full band reconstruction 
of a non-maximally decimated filter bank, assuming that the aliasing can be 
neglected, is 


j 2M-1 

T (z) = ~ 'ZH k ( z )F k (z). (35) 

M k = 0 


A by-product of this approach is the wider spaces between the images and the 
main signal, thereby providing ample room to eliminate the images. This 
enables the application of synthesis filters that may have a wider transition 
bandwidth than their analysis counterparts. The synthesis filter F k (z) may be 

designed to have a wide transition bandwidth and thus can be implemented 
with lower complexity. If {tf*(z)} is a set of ideal brickwall filters, then the 

passband of F k (e ja) ) is the interval I k = —k -—,—k + — ) and the transition 

L M 2 MM 2 MJ 


band is accordingly 




The frequency support and the inter-relationship of the transition and 
passband support of a non-maximally decimated analysis/synthesis filter 
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bank is depicted in Fig. 25. In Fig. 25a, the frequency domain support of each 
analysis filter is depicted. An end-to-end single channel of the non-maximally 
decimated filter bank is shown in Fig. 25b, the transition bandwidth of the 
synthesis filter is shown in Fig. 25c, with the frequency support expressed in 
equation (36). 

We choose a 2 M non-maximally decimated DFT filter bank and recall that all 
the subbands are obtained by a set of uniformly shifted versions of a single 
prototype filter, i.e. H k (z) = H(zW£ M ) and F k (z)= F{zW1m)- Ideally, H k (z) and 
F k (z) are designed such that the distortion function in (35) reduces to a 

constant value. This can be achieved if the sum of the frequency responses of 
the analysis filters H k (z ) is a constant and the synthesis filter F k (z) is 

designed with linear phase and a wide enough passband to pass the subband 
signal undistorted. The condition imposed on the distortion function in (35) is 
reduced to 


2M-1 2M-1 

'£H k (z)F k (z)= J j H k (z) = 2Mc. 


k = 0 


k = 0 


(37) 


It can be shown that this condition is satisfied by a special class of filters 
referred to as Nyquist filters. These filters are obtained by choosing an 
impulse response for the prototype filter such that 

*<" 2M) = io Otherwise <38> 

The Nyquist filters as defined in (38) are also referred to as 2 M-th bund 
filters. The condition in equation (37) requires the impulse response of the 
filter to have periodic zero crossings separated by 2Af-samples. 

In summary, our design guidelines for implementation of the filtering 
operation are listed in Table III. 3. 
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Design Guidelines 

(1) . H 0 (z) must be linear phase and Nyquist (2M). 

(2) . F 0 (z ) must be linear phase and may have a large 

transition bandwidth as specified in equation (36). The 
passband of F 0 (z) must contain the passband of ff 0 (z) plus 

its transition band. 

(3) . Both the analysis and synthesis filter banks are 
implemented using a DFT filter bank, i.e. H k (z) = H 0 (zW£ M ), 
and F k (z) = F 0 (zW 2 k M ). 


Table II1.3 Design Guidelines for Design of Filter Bank 


A more detailed discussion of the filter design for both analysis and synthesis 
bank using the above design guidelines can be found in Section IX. We can 
now draw the overall block diagram of the demodulator as shown in Fig. 26.b. 
In Fig. 26a, the original model is shown as a reference to compare these two 
structures. 

An example of a filter bank for the case when M=16 is considered as 
satisfying the design guidelines is outlined in this section. The frequency 
response of H k (z ) and F k (z) is illustrated in Fig. 27 for the three filters 

(k = 0,1 and 31). In these figures, the horizontal axis is co/( 2 jc). 
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TV. Digital Matched Filtering 

The ideal digital matched filter for detection of signals in AWGN is the 
classical correlator shown in Fig. 7. In the digital implementation, an 
integrate-and-dump filter (IDF) is used to approximate the correlator output, 
as shown in Fig 28. The number of samples per symbol for full response 
signaling (i.e. when pit -kT) = 0 for t € [kT,(k + 1 )T)) is N=T/T S . 

The digital IDF detects the k-th symbol by summing all the N samples taken 
from t=kT + T 0 to t=(k +1)T+ t 0 . When sampling the received signal at a 
constant sampling rate, the i-th sample occurs at time iT s For the k-th 
symbol the "offset in sampling" is defined by the length of time after the start 
of the symbol to when the first sample in the symbol occurs. This time is 
8=iT s -(kT+to), where i is the smallest integer such that iT s -(kT+Xo), is non " 
negative, and x 0 < T s is the timing offset. The first sample of each symbol may 
occur anywhere between 0 and T s seconds after the beginning of the symbol. 

A typical symbol waveform and the sampling points are shown in Fig. 29 for 
the case when a rectangular pulse shape is the transmitted waveform. 

There are two ways to deal with the offset. The first approach is to 
synchronize the sampling clock with the symbol clock. This is not desirable 
in space communication applications since the sampling clock is synchronized 
with an ultrastable clock source (such as a MASER) and is used to time tag 
the carrier phase estimate for ranging applications. The second and more 
versatile approach is to virtually use a finer granularity in the time domain 
than the sampling period. The finer resolution in time is only used for pre- 
computing the matched filter coefficients during the design phase. 
Conceptually, we begin by expanding the input signal by L and obtain a 
matched filter with a higher resolution and then decimate by L. The 
derivation leads to a matched filter which is time varying. The weight 

sequence is matched to the transmitted pulse shape pit), i.e. 

w . = p {(iT s ! L)-kT +8). Note that the discrete time index i of the weight 

sequence w, varies at the rate 1 JL T s . The output rate of the matched filter is 
at the symbol rate, hence, the expanded rate is decimated to the input rate by 
L and then to the symbol rate by D (note that here D=N ). The integer delay 
d = L8 translates into a fraction of the sampling period from the beginning of 
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a symbol period. This delay is estimated by measuring the offset of the 
expanded sampling clock with respect to the phase of the NCO in the symbol 
synchronization loop. This delay varies much more slowly than the symbol 
rate. For every value of d, we can formulate the matched filter as a linear 
time invariant (LTI) system denoted by Q<*(z), as illustrated in Fig. 30. The 
application of the polyphase identity of Fig. 13.c (since the decimation and 
expansion rates are equal in Fig. 30) enables us to model the matched filter 
as an LTI system with the transfer function denoted as Qd(z) and the inverse 
z- transform qd(n). In the following section, the parallelized version of the 
matched filtering operation in Fig. 30 is considered. 


JV.l Combined Demodulator an d Matched Filter 

An efficient implementation of the matched filter is obtained by combining 
the matched filtering operation with the demodulator filter bank. The key 
advantage in combining the demodulation filter bank with the matched filter 
is to use the same subband signals produced in the demodulation stage, 
which are already parallelized and sampled at the lowest rate. For simplicity 
in illustrating the combined structure, we ignore the mixing operation for 
converting the input signal x(n ) to baseband, which has already been 
outlined in Section III. The methodology here is first to design the subband 
matched filters assuming an allpass characteristic of the demodulation filter 
bank, and then later we revert to the previous approach of discarding the 
subbands for synthesizing a lowpass characteristic. 

In what follows, the derivation of combined demodulator and matched filter is 
presented step-by-step. We begin by considering the allpass filter bank 
followed by the matched filter in Fig. 31a. The analysis, matched, and 
synthesis filters are shown in Fig. 31a, with transfer function H k (z), Q^iz) 
and F k (z ) respectively (recall that r(n) was originally defined in Fig. 7). The 

matched filtering operation can be commuted with the synthesis banks as 
shown in Fig. 31b. 

Let q d {n) denote the impulse response of the matched filter with the z- 
transform Q d (z). It can be shown that the block diagram shown in Fig. 32 is 
equivalent to the one shown in Fig 31.b. Note that any other filter with 
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appropriate design criteria may be used for computing this convolution. 

Hence, the synthesis filters have been replaced from F t (z) to F k (z) , even 

though the designer may select to use the same filters. The Fourier 
transforms at points © and © in Fig. 32 are shown in Fig. 33. In Fig. 33, the 
output of F k (z) is denoted byP*(z) where P k (z) = Q d (z)F k '(z). 

The next step in the derivation is the inclusion of a decimator-interpolator 
pair prior to the convolution and the addition of the synthesis filter F k (z) for 

demodulation after the convolution, as shown in Fig. 34. In Fig. 33, the 
frequency characteristics of the signals at various points of Fig. 32 and Fig. 34 
are illustrated. When F k (z) and H k (z) have sufficient stopband attenuation 

and appropriate passband width and ripple, then for all practical purposes the 
systems in Fig. 34 and Fig. 32 produce the same output. The Fourier 
transform at point © is the product of the Fourier transforms of the signals at 
points @ and It is clear that at point @ the frequency support of the 
signal is similar to that of the analysis filter. In the same way, the Fourier 
transform at point 5 is the product of the Fourier transforms at points @ and 
@. Then, the k-th image in point © is equal to the signal in point @. The 
filter F k (z) will only pass the k-th subband signal and reject the remaining 

images, making the signal at points @ and © approximately equal. 
Inaccuracies may only result from non-ideal frequency responses of the filters. 


The idea now is to apply the convolution identity shown in Fig. 35 for moving 
the expansion operation to the last stage. The application of this identity 
results in performing all arithmetic operations at the lowest possible rate, 
prior to rate expansion. In Fig. 35, the convolution operation of the two 
signals ^(n) and x 2 (n) performed at the high rate is reduced to convolution at 
the lower rate by a factor of M. Applying this multirate identity to the 
scheme of Fig. 34, we obtain the subband version of the matched filter shown 
in Fig. 36. 

We use the efficient implementation of a DFT filter bank as shown in Fig. 17. 
In a non-maximally decimated filter bank, the 2 M filter bank output is 
decimated by M. This in turn requires z to be replaced by z in the polyphase 
filters. The resulting structure is shown in Fig. 37a. In this figure 
R.(z) = E" M _ l _ i (z), where E t (z) is the i-th polyphase component of the 
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synthesis filter F 0 (z), E t (z) denotes the i-th polyphase component of the 

/ 

analysis filter and E t (z) denotes the i-th polyphase component of the filter 
bank used for performing the subband convolutions. 

The computations leading to the set of signals s t (n) can be performed off-line, 
if desired. That is, Qd(z)F k (z) is computed and the result is decimated and 

stored in the time domain. In Fig. 37.b, the convolution in the subband 
signals is replaced with a subband matched filter denoted here as G k (z ) for 
k = 0, • • • , 2 M - 1 , where G k (z) is a filter with impulse response s k (n). 

The synthesis part can be further simplified by noting that the output of the 
DFT (matrix multiplication by W) is composed of 2 M points and the 
expansion rate is M . Hence, we can now re-organize the system of Fig. 37 as 
shown in Fig. 38, where the addition of the output sequence of the synthesis 
bank is performed at the lower rate. For effective implementation of the 
interleaver structure in Fig. 38, the interleaver can also be reduced to a 
multiplexor, whereby each symbol output a k is associated with a specific 

subband. This is the subject of the next section. 

IV. 1. a Filter Bank Output M ultiplexing to Symbols 

Let g denote the greatest common divisor of M and D (shown in Fig. 39a), 
i.e. g = gcd(A/,D), then there exist Mand D such that M = gM, D = gD, where 
M and D are relatively prime. So there exist integers n a and n, such that 
n 0 M + /i,D = 1 . It can be shown then that the expander followed by the delay 
chain and decimator shown in Fig. 39a is equivalent to Fig. 39b. 

Let i denote the channel number corresponding to the i-th subband, and 
define /, and r t such that z'^+n) _ z ~m , . • e ^ ^ _ { - ^ Modulo M is the 

remainder part and l, is the integer part of Since gcd(/i,,M)=l then r t * r 

for all i * j, i.e., each r t is unique. Using the multirate identity depicted in 
Fig. 39, the overall structure may be drawn as shown in Fig. 40. It is noted 
that for a fixed symbol rate the delay length for each subband is fixed, hence 
reducing the interleaver to a multiplexing circuit, and a routing switch. 
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TV.I.b Efficient DFT Computation of Synthesis Filters 

Further reduction in the FFT computation of the matrix multiplication with 
W can be attained by noting that only a subset ( 2Af -branches) of the output 
branches are needed for obtaining the output symbol sequence. Let 

2M-\ 

X(k)= 

n=0 


where k = g-l,2g-\,-,2M-l, and M = gM, i.e., k = mg- 1 when 
m = Then, it is easy to verify that the FFT may be decomposed into 

two parts, that is 

2M-1 £-1 

X(mg - 1) = *<2 Mno+n,) •• 

«1=0 

In which case, a g -point FFT can be used to compute the inner DFT for each 
n, , and then a 2 M FFT is performed to compute the final product. This 
concludes our discussion of combined demodulation and digital matched 
filtering. 

IV. 2 IFIR Approximation for Digital Matched filtering 

The results of this section can be used independently of the results in other 
sections of this report. The following method may be incorporated in the 
filter bank structure when the matched filter has a high order and requires a 
large number of taps. This method effectively reduces the number of taps at 
the input rate by realizing an equivalent filter as described here in this 
section. 

We begin by considering the filtering operation shown in Fig. 41. Our 
approach here is based on using the interpolated finite impulse response 
(IFIR) filtering. This means that we use the following decomposition to 
realize an FIR approximation to the matched filter in the frequency domain, 

H ma (z) = G(z LI 2 )I(z), (39) 
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where L is an integer referred to as the stretch factor. This decomposition 
results in an efficient implementation of a narrow band lowpass filter. Let N 
represent the order of filter required to meet the specification to implement 
H MA (z). The stretched filter G(jz L ? 2 ) has transition bandwidth LAf/2, so the 
order of the filter is reduced by a factor of N/2L, where A f is the transition 
width of the original filter. This translates into substantial savings in 
multiply accumulate operations by a factor of L/2. The unwanted shifted 
version of G(z L 1 2 ) is then suppressed with a filter denoted here as I(z). This 
filter has a very wide transition bandwidth [8] and it requires a low order. 

The overall filter is shown in Fig. 42. 

It is possible to decompose the stretched filter G(z) into even and odd 
polyphase components and re-draw the system shown in Fig. 42a as depicted 
in Fig. 42b, and after some manipulation we can arrive at the system shown 
in Fig. 43, where the even and odd polyphase components of G(z) are denoted 
respectively as G 0 (z ) and G { (z), and the LI 2 polyphase components of Hz) are 
denoted as I Q {z),---J Ln -\{z). 

The IFIR digital matched filter described here is designed by considering the 
impulse response of the digital matched filter denoted here as h(n). The 
problem can be formulated as a least squares optimization problem as 
follows: given an arbitrary impulse response h(n ) of order N, find h(n) such 
that 


X 1 kn)-h{n)\\ (40) 


is minimized, subject to the constraint 

H(z)=G{z ul )I{z), (41) 

where the impulse response of G(z ) is g(n ) of order N g and the impulse 
response of I(z) is i(n) of order N f , and N = N f + (L/2) N g . We begin by re- 
writing the constraint (41) in the time domain and use the fact that 
multiplication in the frequency domain is equivalent to convolution in the 
time domain. Let 
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1 

/—V 

o 

1 


*(0) 

h = 


g = 

• 


^*4 

i 

' — * 




(42) 


and define matrices 


and 


K = 


i(0) 

id) 


0 

»( 0 ) 


0 

0 


i(N f - 1) i(N f -2) 
0 i(N f - 1) 


id) 

i(N,~ 1) 


0 


0 


0 

0 

m 

id) 


i(N f - 1 ) 


,(43) 


S = 


10 0 ••• O' 

: ; •. o o 

0 0 0 0 0 

0 1 0 ••• 

0 ••• *\ 

0 1 


l L/2-1 rows of zeros 


, (44) 


Then, we can write 


h = KSg . 


(45) 


Thus, minimizing equation (40) is equivalent to minimizing 

||h - h[ = ||h - KSg I 2 (46) 

When the matrix columns of the K S are linearly independent and 
(KS) T (KS)is nonsingular, the least squares solution is given by the relation: 

g = ((K S) T (KS)) -1 (K S) T h (47) 
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The matrix ((KS) T (KS)) 1 (KS) T is referred to as the pseudo-inverse or the 

Moore-Penrose generalized inverse of matrix KS. Due to the construction of 
matrix KS in our problem, the existence of a solution is always guaranteed. 

Similarly, from equation (47), we can find the best g such that the objective 
function I h{n) - h{n) I 2 is minimized. Let 


5(0) 

0 

0 

0 ' 



50) 

5(0) 

0 

0 



5(^-D 

g(N,-2) 


5(0) 

and f = 

' /(0) 

0 

g(N t -l) 

- 5d) 

5d) 





5(^-1) 




0 

0 


5(^-D 




(48) 


then the objective function in (46) is minimized and 

f = (D T D)D T h* (49) 


The optimization procedure is summarized below: 


Design Algorithm for IFIR Matched Filter: 

(1) . Pick initial guess for fin), e.g. fin) = 1 for all 0 < n < N f . 

(2) . Use equation (47) to optimize g. 

(3) . Use equation (49) to optimize f. 

(4) . If ^1 h(n) - h(n) I 2 is not acceptable go to 2. 

n 


Suppose h(n) has stopband edge of about n la Then L is chosen between a/2 
and 3a /2 and the optimization solution is good in general. 


♦Formulation of this iteration algorithm is due to Yuan-Pei Lin, graduate student, California 
Institute of Technology. 
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In Fig. 44, an example is provided for a rectangular pulse. The filter order is 
chosen to be N= 90, L= 4, Nf=24 and the original impulse response hin) is 
simply a square pulse with the first twenty and the last twenty samples set to 
zero. The approximation of this impulse response is difficult and rather 
interesting, due to Gibbs's phenomena. The approximation error, in equation 
(46) in this example after 10 iterations is only 1.4 x 10 -6 , which is a relatively 
small value. 

We conclude from this example that for even a small number of iterations, 
this algorithm yields a small error for IFIR approximation of the original 
filter. 
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V. Symbol Timing Recovery 

Symbol timing recovery is accomplished here by using a digital transition 
tracking loop (DTTL). The DTTL utilizes a matched filter over the duration 
of the transition epoch of a symbol, giving rise to the name mid-phase 
matched filter. The block diagram for the DTTL is shown in Fig. 45. The 
product of the output of the mid-phase matched filter with the output of a 
transition detector (as will be defined shortly) provides the timing phase 
error. This error is further averaged, filtered and used by the NCO to 
generate a square wave which is used as the reference recovered symbol 
clock. 

The timing phase error is e k = b k r k , where x k is the output of the midphase 
matched filter, that is r* = £ w n jt„ , here the set '¥ = {n:kT + A < nT s <kT- A}, with 

the cardinality N= I T 1 , 2 A is the window size, and the transition detector output 
is 


b _ Signj^^-Signja^ 


0 1 
• +1 if a k = -1 a k _ x = +1 

-1 iffl t =+lo*_i =-l 


The output of the transition detector determines the sign of the phase error. 
The timing phase error e k is averaged over many symbols and further filtered 

as shown in Fig. 45. The steady state and transient behavior of this loop can 
be found in [11]. 

The ideal output of the integrate-and-dump filter (IDF), and the mid-phase 
IDF are illustrated in Fig. 46 for an-all-one symbol sequence. The saw-tooth 
behavior in the ideal case, is due to dumping' the content of the IDF at the 
end of each integration period. The digital matched filtering operation can be 
implemented as a convolution like an FIR filter. When no windowing is used, 
the output of the mid-phase matched filter (length T seconds) can be obtained 
from the continuously running matched filter, by simply sampling the 
matched filter output each 77 2 seconds, as shown in Fig. 46. 
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It has been shown that the loop SNR [11] of the DTTL can be improved, by 
applying a matched filter over a narrower time epoch than the full symbol 
duration. This is achieved by windowing the input sequence around the 
transition epoch of the samples within a symbol period. In general, the 
duration of the window A is defined as an even sub-multiple (e.g. 1/2, 1/4, 1/8, 
etc. ) of the symbol period. Thus, the window sequence becomes 


n («,*) = 


1 kT-Z<nT s <kT + ^ 
2 ' 2 
0 Otherwise 


(50) 


The mid-phase matched filter, in the windowed case, is implemented 
separately from the matched filter with an impulse response given by 

h MP (n) = U(n,k)h m (n ) (51) 

With using the truncated impulse response of the new matched filter h MP in) 
derived from h^in) according to equation (51), the results of parallelization 
of the matched filter in Section IV become directly applicable here by 
replacing w n with h MP {n) in equation (51). 


In view of the above facts, we can summarize here our results and state that 
the mid-phase matched filter can be obtained by simply sampling the matched 
filter output at the half symbol time period when no windowing is applied, and 
for the windowed case, the problem reduces to that parallelization of a 
separate matched filter which was addressed in Section IV. The DTTL can be 
incorporated into the subbands as shown in Fig. 47. In this figure, Af' is used 
to denote the decimation rate of each parallel path. The parameter M' is 
essentially the ratio of the symbol rate to the desired processing rate of the 
parallel DTTL. The designer may select M' = D , if desired. 
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VI. Carrier Phase Estimation and Costas Loop 

We begin by considering a Costas loop for phase estimation and tracking. 

The basic structure of a Costas loop is shown in Fig. 48. The performance of 
the Costas loop is available in many textbooks such as [5]. The squaring loss, 
tracking error variance, and the S-curve associated with using this loop are 
discussed in [10]. 

Let A <f> = (8- 0) represent the phase error between the actual and the 
estimated carrier phase. The in-phase and quadrature components of the 
output of the matched filter ( fi(a k )), respectively can be written as: 

y c (n ) = ( a k + n£)cos(A0) + n s k sin(A^) 

y s (n) = ( a k + n k )s in(A0) - n s k cos(A <j>) (52) 

The bandpass discrete time noise terms n k , n s k in (52) are defined in [5]. The 
output of the phase detector is the product of the two sequences y c (n), and 
y s (n ); which can be approximated as % k - a k sin 2 (6-6). For a binary signal 
and small phase error % k -2(8-6). The phase error is further integrated and 
filtered by an infinite impulse response (HR) filter to track the phase 
perturbations in the carrier phase of the received signal. This phase estimate 
is then used by the NCO to generate the reference in-phase and quadrature 
components. 

The loop update rate is the rate at which the output of the phase detector is 
fed into the loop filter. It has been shown in [10] that the digital 
implementation of the phase locked loop requires the product of the update 
period and the loop bandwidth to be larger than ten. As an example, for a loop 
bandwidth of 50 Hz, a minimum update rate of 500 Hz is necessary. 
Otherwise, the loop behavior will be different from its analog counterpart. 

The structure shown in Fig. 49 for parallelized Costas loop architecture 
corresponds to a loop update rate of 1/ M" of the symbol rate. It is noted that 
the loop bandwidth in most applications is much lower than the data rate. In 
this figure, M " is used to denote the decimation rate of each parallel path. 

The parameter M " here is the ratio of the symbol rate to the desired 
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processing rate of the parallel Costas loop. The designer may select M" - D , 
if desired. 
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VII, Alternative Architectures 

In this section, three different approaches to parallelization of a single filter 
followed with a decimator, as shown in Fig. 50, are re-examined. Let M 
denote the decimation ratio between the input sampling rate and the 
processing rate, let D denote the output decimation rate (which in matched 
filtering is equivalent to the number of samples per half symbol), and for 
simplicity we assume M is a multiple of D; and L denotes the filter length of 
H{z). Here, we consider three options, namely: 

(1) . Direct Parallel Architecture : based on the blocking method for a digital 
filter as described in Section II, and illustrated in Fig. 18. 

(2) . Frequency domain convolution: using a DFT to perform the convolution 
of H(z ) with X(z) as shown in Fig. 51. This approach has been classically used 
[8] to compute linear convolution and is referred to as the "overlap and save" 
method. 

(3) . Filter Bank Approach : based on the filter bank structure derived in 
Sections III and IV. 


VII.l. Complexity and Computational Analysis 

The computational complexity of the three options for parallelization of the 
digital filter followed by the decimator is assessed. The computational 
complexity of each option is stated in terms of the number of real 
multiplication operations needed at the low sampling rate. 

In option 1, the matrix filtering entails a total of ML real multiplications 
when all the coefficients are non-zero. Here, only 1/D of the rows need to be 
implemented, leading to a reduction of the complexity by the same factor. 

In option 2, the frequency domain convolution requires two FFT's of size 
M+L, and M+L complex additional multiplications as shown in Fig. 51. The 
total complexity for this option is depicted in the second row of Table VII.l. 
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In option 3, the complexity of the filter bank approach is approximated by 
assuming that the order of the synthesis filter bank as well as the bank used 
for generating the subband filters is 8 M , and the order of the analysis filter 

bank is 12 M. These approximations have been derived empirically and 

represent a rough figure for the order of the filter bank. The total complexity 
of the polyphase components of the analysis filter H 0 (z ) is equal to the 
complexity of implementing H 0 (z), i.e. 12M multiplications. The same applies 
to /v>(z), but only 1/D of the polyphase components need to be implemented. 
The order of each subband's filter is about (8 M+L)/M. There are M such 
filters with complex coefficients. There are also two FFT s of size 2Af. The 
result is summarized in Table VII. 1. 


Option 

Operations 

I. Block Digital Filtering 

M-.l 

D 

II. Frequency Domain 

2 (M + L) + 2(Af + L) log 2 (M + L) 

III. Filter Bank 

g 

4 M log 2 (2M ) + 28AT +—M + 2L 


Table VII. 1. Complexity of Each Option 


The expressions for the number of operations of these three options are 
plotted in Fig. 52. In Fig. 52, each option is represented as a subspace in the 
two dimensional plane whose coordinates are L and M. Each subspace 
represents the range of variables L and M that yield minimal complexity 
among the three options. It is interesting to note that for small L, the block 
digital filtering results in the lowest number of operations. 
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YIIL Delay 


The total delay of the receiver is composed of group delay, plus the processing 
delay. Our proposed structure of the PRX employs only FIR prototype filters 
with linear phase. Hence, the overall system group delay for carrier and 
timing synchronization is constant. 

The processing delay in a digital demodulator plays an important role. If the 
delay is too large, it can lead to faulty behavior in the synchronization loops 
(carrier or timing). The processing delay of the PRX structure is composed of 
the delays caused by the analysis and synthesis filter bank, plus the delay in 
FFT and inverse FFT computation, plus the delay in the subband matched 
filters. Each FFT corresponds to a delay of log 2 (2Af); the delay for analysis 
and synthesis filters used in PRX is the delay of each prototype respectively. 
The delay of the matched filtering in the subband is the sum of the delays 
caused by the matched filter prototype and the delay of Fq(z) . The total delay 

in the PRX is the sum of the individual delays for analysis, synthesis, FFT 
and FFT inverse, and matched filtering operation. 
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IX. Simula tion of PRX 

In this section, the performance of the PRX architecture is verified by 
simulation. The model simulated here is shown in Fig. 37b, with synthesis 
section as depicted in Fig. 38. In this section r(n) is the complex output of the 
IF demodulator, as shown in Fig. 7. The real part of output of the filter bank 
matched filter after decimation by the integer D is used for detecting the 
symbol sequence. Both real and imaginary parts of the output of the matched 
filter are used for obtaining the phase error estimate for closing the Costas 
loop. The Costas loop implementation is similar to the classical model shown 
in Fig. 5. 

IX. 1. Filte r design. 

In implementing the PRX architecture, three filters have to be designed, as 
shown in Fig. 37b. The filters are H 0 (z) and F 0 (z), which are the prototypes 
for all the filters H k (z), F k (z), and the matched filters G k (z) , for k=0,...,2M-l. 
The filters E k (z) are the type- 1 polyphase components (refer to equation (13)) 
of H 0 (z), and R k (z) are the polyphase components of /^(z), with change of index 
such that q(n) = h(nM - 1 ) . 

The filter H 0 (z) must be designed under the constraint of being Nyquist (2M) 

(refer to equation (37)). By considering the support of each of the signals in 
Fig. 33, the filter H k {z) must reject the images of P k (z) which is derived from 
the convolution of F k (z) and the matched filter. Since we have no control on 
the matched filter, we will assume that P k (z) = F k (z). This is equivalent to 

assuming that the matched filter bandwidth is wider than the synthesis filter 
bandwidth. Hence, the stopband of H k {z ) must include the transition band of 
the adjacent images of /^(z). This requirement is equivalent to setting a limit 
of n/M for the sum of the transition regions of H 0 (z) and F 0 (z) as shown in 
Fig. 53. The effect of F 0 (z) on the complexity of the system is more 

significant, since it also impacts the length of the impulse response of the 
subband matched filters G k (z), which have the highest number of taps among 
all the filters used in the PRX. Effective reduction of the order of F 0 (z) can be 
achieved if the prototype analysis filter H 0 (z) is designed with a sharp 
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transition band. This choice of H 0 (z ) allows for F 0 (z) to have a wider 

transition and thus be lower in complexity. An additional property of the 
synthesis filter F 0 (z ) is to possess a wide enough passband with low ripple 
that covers the transition region of H 0 (z). We have chosen the order of H 0 (z) 
to be 12 M and the order of F 0 (z) to be 8 M. Then each polyphase component of 
H 0 (z) has length six and of F 0 (z) has length four. These orders are not a 
minimal choice. It is possible to reduce the length of H 0 (z ) and F 0 (z) by better 
filter design optimization techniques. 

The approach taken here for designing H 0 (z) is by windowing the impulse 
response of an ideal filter. The window chosen here is the Hamming window 
to provide smooth response and a non-equal ripple stop band. In our 
application, monotonically increasing stop band attenuation (or equivalently 
non-equal ripple) is a desired property, since it results in further rejection of 
distant images from the filter cut-off frequency. This property insures that 
only the neighboring filters contribute to the aliasing distortion. It must be 
noted that by choosing the bandwidth to be n/2M, the resulting filter is 
forced to be Nyquist-2M, independent of the window shape used for designing 
the filter bank. 

The filter F 0 (z) must have a symmetric impulse response for linear phase 

property. It must also have a wide enough bandwidth to preserve the 
Nyquist property of H 0 (z). Recall that F 0 (z) should also satisfy, together 
with H 0 (z), the requirement illustrated in Fig. 53. In our design example, the 
stop-bands of both H 0 (z), and F 0 (z) provide better than 60-dB attenuation. 

The construction of subband matched filter G t (z) is as follows. Let q d {n) be 
the desired matched filter impulse response. Let s k (n ) be the impulse 
response of G k {z). The complex valued sequence s k (n) is obtained by passing 
the sequence q d {n ) through the filter F k (z) and decimating the output by M. 
This idea is shown in Fig. 36. Here, we have chosen F k '(z)=F k (z) for 
simplicity. The computation of s k (n ) begins by computing f k {n) which is the 
impulse response of F k (z). The impulse response f k {n) is calculated by 
multiplying f 0 (n) with Wjf . Then q d (n ) is convolved with f k {rt) and the result 
is decimated by M to obtain s k (n). 
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IX. 2. Generation o f Input Signal. 

The RF input to the receiver is filtered by an analog band-pass filter and then 
sampled by an A/D converter, forming the input to the digital portion of the 
receiver. Since the simulation software can only simulate discrete time 
systems, we have used in the simulation of the analog portion a much higher 
sampling rate than the one used in the actual receiver, e.g. 100 times. In 
order to have a different sampling rate in each portion, the simulation of the 
anal og part is executed separately and its output is saved into a file. This file 
is then used as the input to the simulation of the digital portion. The 
implementation of the input generation system is illustrated in Fig. 54, where 
an example with 4 samples per symbol is shown. A Gaussian filter was 
chosen to model the analog filter. This choice of filter was made for 
introducing low distortion of the transmitted pulses. The system generates 
base-band samples, but in the receiver portion these samples are up- 
converted to IF frequency and then demodulated to baseband again to fully 
model the IF downconversion stage needed in real applications. 


IX. 3. Description of PRX used in the simulation! 

This input signal (from a file) is up-converted and the IF signal is formed by 
multiplying the input signal with a sinusoid of frequency fJA. This signal, 

which is the input to the receiver, is demodulated by an NCO (Numerically 
Controlled Oscillator) to construct the real and imaginary components (or I 
and Q) of the demodulated base-band signal. This signal is then vectorized to 
a length 2 M -vector by a serial to parallel converter. Note that there is an 
overlap of M samples between every two consecutive vectors, where the M 
first components of the n-th vector are the M last components of the n-1 th 
vector. In our Signal Processing Workstation (SPW) simulation, the number 
of filter banks is 2M=16, hence, the decimation rate is M= 8. In a system 
with 100 MHz sampling rate, the processing rate would be 12.5MHz. 

The vectors are processed at the low sampling rate which is fJM. Each of the 

components of the vector is filtered by the appropriate polyphase components 
of H 0 (z). Altogether, these filters are referred to here as the vector filter H. 
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The resulting sequence of vectors output by H is processed by an FFT block, 
one at a time. The subbands (altogether consisting of a vector sequence) are 
masked such that only seven (7) bands are fed through the synthesis bank. 
This mechanism results in realizing a lowpass frequency characteristic from 
the filter bank, as described earlier in Section III. These seven bands cover 
the frequency region -0.4471 to 0.44 ji and reject all the frequencies l/l>0.57t 
(in our 100 MHz example, this bandwidth corresponds to a 22 MHz band). 
This symmetric arrangement of the subbands around the zero frequency (d.c.) 
manifests itself in a symmetric frequency response of the subband matched 
filters. The subbands are further processed by a vector filter G for subband 
matched filtering, as described in Section IV. The output of G is transformed 
back by an I FFT and processed by the synthesis vector filter F. The output 
vector of length 2 M samples is combined to form a length M vector by 
delaying half of the components, and summing the latter half to the other 
half as depicted in Fig. 38. This forms the parallel output of the matched 
filter. The output of the combined demodulator and the matched filter is 
generated in parallel at the lower rate. A subset of these parallel outputs, 
specifically 1 ID of these outputs are used for detecting the symbols, and for 
closing the Costas carrier tracking loop. Recall that when M is a multiple of 
D, the symbol sequence has a one-to-one correspondence with the subband 
signals ( refer to Section IV. 1. a). 

In the following section, simulation results are presented for both the bit 
error rate (BER) and the mean square error (MSE) associated with the 
system described here. 

IX. 4. Si mulation results. 

The set of experiments is summarized here as follows. 

L Partial band reconstruction. 

This test is intended to verify the reconstruction of a desired band by 
selection of the appropriate subbands. In this test, the filters G k {z) are 

set to unity (no subband matched filtering). One or more subbands' 
responses are computed by masking out (multiplying by zero) all the 
other bands. The frequency response of the system is obtained by 
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applying a delta function at the input and computing the FFT of the 
output sequence from the filter bank. The results are shown in Fig. 55. 

In Fig. 55a, the response of the seven bands is shown. In Fig. 55b, c two 
individual subbands are shown. The response in Fig. 55a is the 
s umma tion of seven such individual responses. This experiment 
verifies the partial band reconstruction property of our filter bank used 
for demodulation. 

2. Combined demodulation and matched filtering. 

Here we demonstrate the impulse response of an integrate-and-dump 
filter, implemented by the filter-bank, with subband masking included. 

A low-pass response is formed by allowing only seven sub-bands to be fed 
through as described earlier in the first experiment. The integrate-and- 
dump response when incorporated into the filters G k {z ) is depicted in 

Fig. 56. In this experiment, the proper operation of our structure for 
combined demodulation and matched filtering is confirmed. 

3. BER degradation of baseband filter-bank imp lementation. 

In this simulation our goal is to assess any losses associated with 
parallel realization of the matched filter. Here the Bit Error Rate ( BER) 
is measured by simulation at the baseband. In the baseband implem- 
entation, there is no demodulation stage and the subbands are not 
masked, since there are no double frequency images to reject. In this 
simulation, we compared the BER performance of the ideal matched 
filter (integrate-and-dump) operating on a random BPSK signal in 
AWGN channel, and the filter-bank implementation of the same filter 
using the same signal. The analysis and synthesis filter lengths were 
chosen 12M and 8M respectively. The result indicates that the difference 
between the two systems is too small to be observed even in very long 
runs (e.g. 10 7 symbols at low SNR), thus the parallel realization results 
in negligible loss of performance. 
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4. Mean Square Error (MSE) measurement. 


In order to further quantify the implementation error, in this simulation 
we use the Mean Square Error (MSE) criterion. The filter bank imple- 
mentation error of the matched filter was measured by simulation. 
Having a matched filter (integrate & dump) operating on a random 
signal and the filter-bank implementation of the same filter operating in 
parallel on the same input signal, we measure the average power in each 
realization. This constitutes the MSE measurement here, when the 
output power of the matched filter is normalized to unity. The very small 
MSE affirms the negligible loss in the BER measurement. The MSE is 
tabulated in Table IX.4.1. 


Order- H 0 (z ) 

Order- F 0 (z) 

MSE 

16M 

12M 

8.17*10 A -6 

12M 

8M 

1.27*10 A -5 


Table IX.4.1 MSE Measurements 

5. BER measurement in IF simulation. 

The IF simulation arrangement was described earlier. In this 
simulation, the BER result is compared to the ideal theory when only 
four (4) complex samples per symbol are used for detecting the symbols. 
The Bandwidth-bit-time (BT) product of the simulated analog filter is 
1.5. The results are shown in Fig. 57. Note that the degradation shown 
in Fig. 57 is due only to the low number of samples per symbol [7]. 
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x. Future Direction for Research and Applications 

The effect of quantization and finite bit arithmetic on the overall system has 
to be further investigated. The need may arise that more specific design 
procedures have to be formulated for low sensitivity filter bank design. 

A second area of interest is to investigate and study possible approaches for 
efficient hardware realization of the receiver. Here, the merits of coarse (e.g. 
board level) or fine computational processes (e.g. using systolic arrays, 
custom VLSI chips, ASICs or others) for this application are assessed. 

A third area of interest is to tailor the architecture of PKX for multiple 
spacecraft applications. Missions involving multiple spacecraft within the 
same line of sight (or beam width of the antenna) could effectively employ a 
single receiver using our methodology. This is due to natural decomposition 
of the input signal into non-overlapping frequency bands in the PRX. 

In applications with low-to-medium data rates, the PRX can be used to 
directly record the subband sequences from each filter bank onto a low speed 
recording medium (such as magnetic tape). The quantized subband 
sequences could then be transported over a communication link for further 
software processing at a remote site. The key to this utilization is that the 
subband sequences are output at a reduced rate, arbitrarily selected by the 
designer, without prohibitive constraints on the recording rate. 

Another area of future research is to augment this structure and derive a new 
class of architectures tailored for direct sequence spread spectrum commu- 
nication using the multirate systems approach. 
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XI. Conclusion 

In this report, we succeeded in formulating and devising an architecture for a 
digital receiver such that the processing rate in the digital signal processing 
hardware is arbitrarily selected by the designer. A brief overview of 
multirate and filter bank systems was presented. Each subsystem for a 
digital receiver was addressed, demodulation, matched filtering, and carrier 
and symbol synchronization. Specifically, an architecture was devised that 
operates at the low rate, and the detected symbol stream is directly output 
from the subbands. Various options for the implementation of the overall 
receiver were studied and their associated complexities were assessed. 
Simulation and numerical analysis of the PRX architecture were undertaken, 
and the symbol error rate obtained in this simulation indicates that there is 
no loss associated with the PRX when compared to the classical 
implementation of the receiver. 
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Figure 2. Input Spectrum and IF Carrier for Bandpass Sampling 
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Figure 8. Basic Operations: 
a. Decimation, 
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Figure 10. Commutator Model 
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Figure 12. Noble Identity for Multirate Building Block 
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Figure 14. Maximally Decimated Analysis/Synthesis Digital Filter Bank 
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Figure 20. Demodulation and Filtering 
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Figure 22. Effects of Discarding in the Structure of DFB 
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Synthesis Filters 
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Figure 24. Subbands’ Spectrums After the Decimation-Expansion in a 
Non-Maximally Decimated Filter Bank, M = 3 



0<k<2 M - 1 
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Figure 25. Non-Maximally Decimated Filter Bank: a. Analysis Filter Bank Frequency Support, 
b. Subband Channel, c. Typical Frequency Response of Synthesis Filter with Relation 

to the Signal and its Images 
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. Concatenation of the Filter Bank with the Matched Filter 
b. Commuting Q d (z ) with F k (z) 
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Figure 35. General Identity for Multirate Convolution 
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Figure 39. Multirate Identity for Exchanging Decimation and Expansion 
a. Original Model, b. Equivalent Model 
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Figure 4L Digital Matched Filter Model 
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Figure 43. IFIR Implementation of Matched Filtering 
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Figure 45. Digital Transition Tracking Loop Block Diagram 
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Figure 46. Comparison of Sampling Points: a. Output of IDF and Midphase 
IDF Output, b. Output of Matched and Mid-Phase Matched Filter 
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Figure 52 . Complexity of Various Options for parallelization of Filtering 
Operation versus L (Filter Order) and M (Number of Banks) 
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Figure 53. Filter Design Specifications 



Figure 54. Input Generation For Simulation 
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Appendix A. MathCad™ Software for Gen eration 
of Filter Bank Coefficients 
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A ppendix A: Filter Bank Receiver — Generation and Testing. 
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A.1 Filter Design. 

B := 16 Number of filters in the filter bank 

M : = 8 Decimation ratio (B=2M) 


Analysis filter design; 


N h - B-6 - 1 Length of analysis filter. 6 can be replaced by any even number, else fix the 

delays. 


setwindow( 5 ) = 5 


h 


- lowpass 



i = 1 .. length(h) 


Set the window to Hamming 

This makes a B-band filter. The function lowpass generates a FIR by the 
windowing method. 


k :=0..B - 1 


N h = 95 

hO. =h i. _ Position the center of the impulse response at a multiple of B, for obtaining a 

. n _ n delay which is a multiple of B. 



Figure A.1 : Analysis filter H0(z) Impulse response. 

n =0.. 99 f = — 

" 100 


W g “ exp 


J • 


2-71 

B 


H0(z) =Vz''hO. 
i 

a :=1.45 + 0.52j 


H(k,z) -H0(z-W B k ) 

verifying that the filters sum to z“ 3 B so they are indeed Nyquist filters, a 
is an arbitrary number. 


^H(k,oc) =-6.71718*10 10 +7.19988*10 '°i 
k 


a 3 8 =-6.71718*10 10 +7.19988*10 10 i 




Synthesis filter design; 


setwindow(5) =5 


N f = B-4 - 1 

1 19 ' 

flF =lowpass| ,Nr 

\2-B 1 


Length of synthesis filter, same remarks as above. 


N f = 63 


j : = 1 .. length( flF) 


fO. =ffj_ , Put the middle at a multiple of B, for having a delay which is a multiple of B. 

«V =0 

FO(z) =£V j fO. F(k,z) =F0(zW B k ) 

j 

Save prototypes in file for SPW simulation: 

WRTTEPRN(f) : = flF 
WRTTEPRN(h) : = h0 



Figure A.2: Synthesis filter F0(z) impulse response. 


Generating Gk(z): 


i =0.. 7 

x. := 1 x is the desired bank response. In this case x is an integrate&dump of 8 samples. 


^ := fD.-W B ' j k Generate all Fk(z) coefficients 
lng = length(fl)) + length(x) - 1 


Convolve Fk(z) with x. 

,Re[(A T ) 


<k> 

go -response 


<k> 


,lng 


-hj *response| 


t t Im (a T ) 


<k> 


J>lngJ 
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in — v/ ». 

M 

ed :=eo 

m > k mM * k Decimate the result. 

WRITEPRN(gi) := Im(gd) Write to file. 

WRITEPRN(gr) =Re(gd) 

z n :=exp(j -2-Jt-fJ 

A.2 Frequency domain tests. 

The frequency response of several analysis filters and one synthesis filter is shown in Figure A.3. 



Figure A.3: Frequency response of H0(z)...H3(z) and of FO(z) 


A test that the combined filters Hk(z)*Fk(z) do sum to a constant (the filters Fk(z) distort the 
Nyquist property of the filters Hk(z) ) is shown in Figure A.4. In this figure, we also demonstrate 
how a frequency band is constructed out of a few subbands. 


V(k,z) :=F(k,z)H(k,z) 
p =0..4 


c-x 
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Figure A.4: Frequency band composition from several subbands. 


Adding decimation-interpolation to the picture. 

Wm =cxp ( 4 “) m :=0.. M - 1 

Computing the frequency response of Hk(z) and Fk(z) followed by decimation by M and then 
interpolation by M. 

HH(k,z) :=^H(k,z-W M j FF(k.z) :=^F(k,z W M m ) 
m m 

HHFF(k,z) :=HH(k,z)FF(k,z) The combined response with all the images. 

HHFF_F(k,z) ;= HH(k,z) FF(k,z) F(k,z) The filter Fk(z) removes all the undesired images. 





Figure A.5: Frequency response of a subband after 
decimation-interpolation by M. 


The response of the complete system. The system is supposed to compute the convolution of X and 
Q, both may be arbitrary signals. In reality X is the input to the receiver and Q is the matched filter 
response. We set here Fk(z)=Fk(z). 

Arbitrary choice of X and Q 

Q(z) :=3 - z' 1 +- 5 - z ~ 2 + 8jz" 3 

X(z) := 1 -t- z" 1 - 3z~ 4 

HX(k,z) :=H(k,z) X(z) X passed through each one of Hk(z). 

FQ(k,z) :=F(k,z) Q(z) Q passed through each one of Fk(z). 

HXl(k,z) =y^Hx(k,z W M m j Decimation-interpolation by M. 
m 

FQl(k.z) : = y FQ(k,z W M m ) _ . t . . . , u tt 

i V M / Decimation-interpolation by M. 

m 

Od(z) :=X(z) Q(z) The desired response (convolution of X with Q). 

03(z) :=^F(k,z) HXl(k,z) FQl(k,z) 
k 


The filter bank output. Here each subband is convolved, 
and then passed through the synthesis filter Fk(z). The 
result is summed to provide the output. 
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Annendix B. SPW block diagrams 
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Figure B-l. Filter Bank Receiver Block Diagram in SPW 














Figure B-2. Parallel Demodulator Block 
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HOLD RESET 


Figure B-4. Complex FIR Filter Block 
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Figure B-5. Vector Multiply by 1 7-1 Block 





VECTOR LENGTH: 16 
FILE NAME: 



Figure B-6. Vector Filter Block 









Input Parameters 

samp, per sym. (beta): 8.0 
frac. offset (alpha) : 0 
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Figure B-7. Input Generation SPW System Block Diagram 
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Appendix D. C-Proarams for Designing PRX 

Filter Banks 


Appendix D.1 Generating the Polyphase filters 

I. 'poly _gen.c', C program for generating the polyphase filters' files for SPW 
from the ASCII files of the coefficients of H0(z) or F0(z). 

Appendix D.2 Generating the Subband Matched filters 

II. 'filjarr.c', C program for generating the Gk(z) filters' files for SPW from 
one ASCII file . 

Appendix D.3 Designing the IFIR Matched Filter 

III . 'ifir.c ', C program for Designing IFIR Filters 
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while ( ! feof (filter_f ile) ) { 

fscanf (f ilter_f ile, "%le" , fcb[len++] ) ; 
if <len>=1000) { 

print f ( “Filter too long"); 
exit (0) ; 




